Exploring Feature-Level Duplications on Imbalanced Data Using Stochastic Diffusion Search
نویسندگان
چکیده
Swarm intelligence mimics the behaviours of social insects like bees, wasps and ants to offer powerful problem solving metaheuristic which lies in a network of interactions amongst the agents of a multiagent system as well as with their environment. One of the computer algorithms inspired by swarm intelligence is the stochastic diffusion search (SDS). SDS uses some of the processes and techniques found in swarm to solve search and optimisation problems. In this paper, a hybrid approach is proposed to deal with real-world imbalanced data. The proposed model involves oversampling the minority class, undersampling the majority class as well as optimising the parameters of the classifier, Support Vector Machine (SVM). The proposed model uses Synthetic Minority Over-sampling Technique (SMOTE) to perform the oversampling and the agents of a swarm intelligence technique, SDS, to perform an ‘informed’ undersampling on the majority classes. The use of this swarm intelligence technique in conducting the undersampling tasks is investigated and its impact on improving the classification results is demonstrated. In addition to comparing the agents-led undersampling with random undersampling, the results are contrasted against other best known techniques on nine real-world datasets. Additionally, further experiments are designed to explore the behaviour of the SDS agents during the undersampling process.
منابع مشابه
Ensemble-Based Wrapper Methods for Feature Selection and Class Imbalance Learning
The wrapper feature selection approach is useful in identifying informative feature subsets from high-dimensional datasets. Typically, an inductive algorithm “wrapped” in a search algorithm is used to evaluate the merit of the selected features. However, significant bias may be introduced when dealing with highly imbalanced dataset. That is, the selected features may favour one class while bein...
متن کاملStochastic Diffusion Search Review
Stochastic Diffusion Search (SDS), first incepted in 1989, belongs to the extended family of Swarm Intelligence algorithms. In contrast to many nature-inspired algorithms, SDS has a strong mathematical framework describing its behaviour and convergence. In addition to concisely exploring the algorithm in the context of natural swarm intelligence systems, this paper reviews the developments of t...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملA Statistical Study of two Diffusion Processes on Torus and Their Applications
Diffusion Processes such as Brownian motions and Ornstein-Uhlenbeck processes are the classes of stochastic processes that have been investigated by researchers in various disciplines including biological sciences. It is usually assumed that the outcomes of these processes are laid on the Euclidean spaces. However, some data in physical, chemical and biological phenomena indicate that they cann...
متن کاملAttention through Self-Synchronisation in the Spiking Neuron Stochastic Diffusion Network
The paper discusses ensemble behaviour in the Spiking Neuron Stochastic Diffusion Network, SNSDN, a novel network exploring biologically plausible information processing based on higher order temporal coding. SNSDN was proposed as an alternative solution to the binding problem [1]. SNSDN operation resembles Stochastic Diffusion Search, SDS, a nondeterministic search algorithm able to rapidly lo...
متن کامل